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Il. INTRODUCTION AND LITERATURE REVIEW 


A. PROBLEM 

Membership in a military organization is unique in many ways. One source of 
this uniqueness is the overriding importance of the mission of the armed forces: the 
protection of the nation’s vital interests, the deterrence of war, and the attainment of 
the the nation’s objectives by the use of force if war should come. Means to 
accomplishing the military mission, like many things, can be restricted by budget and 
manpower considerations. The current budget deficits, together with future obligations 
connected with a military build-up, are forcing a review of the appropriate levels of 
reserve and active forces. 

To maintain readiness in the face of budget restrictions, military decision makers 
have been pursuing a policy of increasing reserve manning while maintaining a cap on 
active force end-strengths. Savings estimates resulting from placing military units in the 
reserve rather than the active forces are made generally from studies which compare 
current peacetime costs for existing similar units in the active and reserve forces. These 
estimates generally show that the saving achieved is a strong function of the type of 
unit and required readiness or activity level. Units where the capital labor mix is high 
and where readiness demands high activity levels (more typical of air force and navy 
flight units) show savings of roughly 25% to 33% for reserve units, whereas more labor 
intensive units (typical of army infantry units) show savings of as much as 70% 
[Ree ep. 220) 

In addition, planners count on the assumption that reserve forces are less 
expensive than active forces to maintain because reservists are paid only for the time 
they actually spend at drills. Also, the contribution that reserve forces make to overall 
readiness has been increasing steadily since the inception of the of the voluntary force. 
This is because escalating personnel costs have forced planners to limit the size of 
active forces, and the removal of the draft has diminished the capability of the active 
force to quickly expand and mobilize. Currently, any significant mobilization would 
require reserve augmentation of active forces almost immediately [Ref. 1: p. 8]. 

To meet this expanding role, reserve forces are organized into three categories; 
the Ready Reserve, the Standby Reserve, and the Retired Reserve. The Ready Reserve 


is the primary contributor to readiness and it is composed of the Selected Reserve and 
the Individual Ready Reserve (IRR). The IRR consists of individuals who train at 
irregular intervals and whose role is augmentation of existing units during mobilization. 
The Selected Reserve is the most significant component of the Reserve force and it 
consists of units which are organized and equipped to perform specific missions, 
trained personnel who augment active units, and individuals in training pipelines. 


Mes sclecitcammncserve Of the Departmentsol Wefense is made up of six 


components: 
Army National Guard Marine Corps Reserve 
Army Reserve Air National Guard 
Naval Reserve Air Force Reserve 


The Selected Reserve contains combat and support units that would be vital to 
the successful operation of a major war. For example, the Selected Reserve contains: 

* Army: Combat divisions and brigades, armored calvary regiments and numerous 
support units. 

* Navy: Mine warfare ships, amphibious ships and anti-submarine patrol squadrons. 

* Marine Corps: A combat division, an air wing and support units. 

* Air Force: Fighter, intercepter, tanker and airlift squadrons. 

Most members of the Selected Reserve are required to participate in training 
drills for 24 days a year and in two weeks of annual active duty for training. New 
enlistees who do not have previous military service also are required to undergo three 
Or more months of initial entry training along with their Active force counterparts. 
Each member is paid, according to grade, for participating in training. 

The impact of recent Defense manpower policy has been that while Active force 
levels have remained constant over the last decade, Selective Reserve end-strengths 
have risen from 788,000 in 1978 to 1,100,652 in September 1985 [Ref. 2: p. I]. A 
breakdown of current Selected Reserve strength by components is shown in Table 1. 
This analysis will focus on the Selected Reserve. 

Future projections for all components show increases in end strengths for 
Selected Reserve Forces. For example, the Army manpower plan submitted in the 
February 1985 budget projected an increase of 116,000 members of the Army Selected 
Reserve (Army Reserve and Army Reserve National Guard) by 1990. This represents 


an increase of 16 percent of current end strength over five years (Ref. 3: p. 1]. 
















ADEs 
SELECTED RESERVE MANPOWER 


September 1985 


COM PONES STRENGTH 
Army National Guard 439,952 
United States Army Reserve 292,080 
United States Navy Reserve [29-832 
| United States Marine Corps Reserve 41,586 
| Air National Guard 109,398 
United States Air Force Reserve 75,214 
DoD Total 1,008,062 
United States Coast Guard Reserve 12,590 
Total 1,100,652 


Source: Defense Manpower. Data Center, 
Official Guard and Reserve Manpower Strengths and Statistics, 
September 1985. 


Meeting these expansion requirements efficiently will depend upon a sound 
understanding of the impact of factors which affect Reserve force supply levels. At 
present that tvpe of information is not available. 

The econometric model is perhaps the most widely used technique for evaluating 
military personnel supply. Typically econometric manpower supply models attempt to 
estimate or predict the number of contracts signed by (or actual enlistments of) “high 
qualitv” young males based on variables deemed to be related to the enlistment 
decision. This analysis will explore the use of cluster analysis to classify Army Reserve 
Centers in relation with local accession factors. These procedures empirically form 
“clusters” or groups of highly similar entities. Entities involved here are Reserve 
Ceniters. 

The analysis will explore models estimated for Army Selected Reserve data. This 
is because Army components represent 67 percent of current Selected Reserve 
manpower (see Table 1) and Army units are the best examples of units which are 
forced to survive within the confines of their local labor market. Air Force and Navy 
Reserve units have more flexibility in recruiting for and manning units from outside 


their local areas. 
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For a better understanding of the possible impact that the analysis can have, 


Table 2 1s provided. Table 2 provides a force profile for the DoD Selected reserve in 


1985. An analysis of this type can assist policy formulation in the following areas: 


* serve as a source of hypothesis about accessions 


* allocation of new authorizations across units 


* location of new units 


* allocation of recruiting resources across geographic areas 


enlisted 
BORCE Sul 
ARNG By Eeol2 
LSAR 238,220 
USNR 106,529 
USMCR 38,204 
ANG 96,361 
EAR Secs 
DoD 936,525 


TAB PES. 
Sereewel RESERVE PROFILE-SEPTEMBER 1985 
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Source: Defense Manpower Data Center, a 
Official Guard and Reserve Manpower Strengths and Statistics, 


September 1985. 


B. CLUSTER ANALYSIS 


Yo 
FEM 


avg avg 
AGE YOS 
30 8 
2 7 
ol 9 
24 4 
34 11 
33 10 


Yo 
HSG 
a/ 
50 
64 
1 
79 
76 


Clustering is the grouping of similar objects. The principle functions of clustering 


are to name, to display, to summarize, to predict, and to aid in interpretation of data 


with many dimensions. Clustering techniques were first developed in the field of 


biological taxonomy. It is one of several methodologies included in the broader 


category called classification. 


The operational objectives of clustering is to classify new observations, that is, 


recognize them as members of one category or another. This can be contrasted with 


discriminant analysis where some part of the structure is Known and missing 
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information is estimated from labeled samples. In cluster analysis little or nothing is 
known about the category structure. All that is available is a collection of 
observations whose category membership are known (variables). The analysis seeks to 
discover a category structure which fits the observations. The problem may be stated 
as one of finding the natural groups, which means to sort the observations into groups 
such that the degree of “natural association” 1s high among members of the same group 
and low between members of different groups. 

Most of the well known clustering techniques fall into one of two main 
categories: (1) hierarchical and (2) nonhierarchical (partitioning). [Ref. 4: p. 124]. The 
former is one in which every cluster obtained at any stage is a merger of clusters at 
previous stages. The nonhierachial procedures however form new clusters by lumping 
and splitting old ones. 

Ina geometric sense, every observation may be viewed as a point in p- 
dimensional euclidean space. [Ref. 4: p. 127]. This swarm of data points may contain 
dense regions or clouds of data points which are separable from other regions 
containing a low density of points. These denser regions constitute what are Known as 
clusters. In one and and two dimensional cases, it 1s easy to visualize and detect 
clusters from scatter plots, assuming that clusters exist. In higher dimensions (which 
will be used in this analysis) clustering becomes extremely difficult without the aid of a 
computer. 

Cluster analysis techniques have been applied in many fields of study. The 
terminology differing from one field to another in literature 1s both voluminous and 
diverse. “Numerical taxonomy” 1s frequently substituted for cluster analysis among 
biologist, botanist, and ecologist, while some social scientist may refer to “typology.” 
Other frequently encountered terms are pattern recognition and partitioning. While 
techniques such as discriminant analysis have been studied by statisticians for nearly 45 
years, Cluster analvsis has only recently come to statistical notice. Any method which 
partitions a set of objects into subsets on the basis of measurements taken on every 


object qualifies as a clustering method. 


C. REVIEW OF LITERATURE 
1. Active Force Supply Studies 
The purpose of this review is to present variables which have been found to be 
important in military accession research, and provide a better understanding of cluster 


analysis. Understanding the importance of the variables in conjunction with cluster 
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analysis is critical in any eventual application of findings to the management of 
mulitary forces. 

There is one problem which continually recurs in studies of military supply. 
Almost all studies use regression analysis to model supply levels. Because of data 
constraints the dependent variable used is a measure of enlistment contracts signed or 
accessions. The problem 1s that the variable which is being modelled, potential military 
supply, is not always the same as the number of recruits enlisted. This 1s because the 
service set quotas on enlistment levels. These quotas vary between services. Also, 
within services, different quotas applv for different categories of recruits. This means 
that the variable researchers are measuring, supply, is actually a function of both 
potential supply and enlistment quota. The implication is that results of studies which 
use demand constrained data do not accurately reflect the underlying relationships 
between the economic environment and potential supply. Methods which have been 
used to overcome this problem are discussed in more detail below. 

In a May 1985 study, Dertouzos pointed out that previous studies of factors 
influencing the supply of enlistments did not consider the effects of demand, such as 
the enlistment goals and incentives that are set up for recruiters to secure high quality 
recruits. His analysis demonstrates that that enlistments are produced through the 
simultaneous interaction of both supply and demand factors [Ref. 5: p. 3]. This 
Suggests that past research results that ignore demand are likely to have been flawed by 
significant estimation biases. That is, changes in such factors as unemployment, relative 
wage rates, and recruiting resources can affect enlistments more than past studies have 
indicated. 

Lawrence Goldberg conducted a comprehensive study which developed an 
econometric supply model for all services using pooled time Series, cross section 
recruiting data from 1976 to 1980. The model was developed using log linear ordinary 
least squares regression. The dependent variable in the model was the number of male 
nonprior service (NPS) high school graduates (HSG). The model was estimated 
separately for all HSG and those in mental categories I-II[A. The dependent variables 


in the model were: 
* relative military / civilian pay 
* civilian unemployment 
* military education benefits 


* expenditures on Federal youth employment programs 
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merace 
* number of recruiters by service 


* Navy advertising budget (other services data were unavailable) 

Goldberg handled the problem of demand constraints by focusing his analvsis 
on the results pertaining to the male high quality sample (1.e. mental category I-IIIA 
HSG). This is a standard procedure in supply modelling. He claimed that this group is 
rarely demand constrained and therefore his model should produce accurate results 
[Ker Gapaslo} 

A study by Daula and Smith rejected Goldberg’s contention that using high 
quality enlistee samples removes the problem of demand constraint contamination in 
study results. They contend that even high quality groups may be demand-constrained 
in certain geographic areas. [Ref. 7: p. 6] To overcome this problem they partition 
their data into two samples. One sample is data from areas where recruiting goals are 
met (i.e. supply constrained). The other sample is all the demand constrained data. 
The total sample consists of time series, cross section data from 54 Army recruiting 
districts by month from October 1980 to June 1983. 

They included the following independent variables in a log-linear OLS 


regression: 
* military pay and bonuses 
* civilian pay 
* unemployment 
* qualified military population 
* percent minority 
* percent voting Republican 
* enlistment goals for all services 
* number of Army recruiters 


* levels of national and regional advertising. 
One important result from this study came from estimating the supply 
function using only supply-constrained data but including the high quality enlistment 
goal as an independent variable. The resulting coefficient of the goal variable was not 


significantly different from zero, indicating that recruiters goals have no effect on 
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enlistments in supply-constrained districts. This result supports the validity of Daula 
and Smith's data partitioning methodology. 

In a subsequent study, Goldberg and Greenston reported on the updating and 
further development of a basic time series cross section model, which analyzes the 
supply of nonprior service, male, upper mental category enlistments. This study 
updates the data base to include FY 1983 observations, develops better measures of 
kev variables (civilian pay, unemployment, and population), and reestimates the model 
with data for the longer period FY 1976-1983. [Ref. 8: p. 61] A major improvement 
was in the use of unemployment data. They used an annual measure for “each” Navv 
recruiting district (NRD) based on the aggregation of monthly county-level data from 
the Bureau of Labor Statistics. A framework of the model is the contention that 
enlistment is viewed as an employment decision that is heavily influenced by economic 
considerations. [It is assumed that an individual compares two employment 
alternatives--work in the military or work in the private sector--and chooses the one 
that maximizes economic benefits. This implies that the enlistment propensity will 
increase if there is an increase in the economic benefit of working in the military (such 
as an increase in military pay) or a decline in economic benefits of working in the 
private sector (such as an increase in unemployment). In addition, the authors assume 
that enlistment supply in a NRD depends on the enlistment propensity of the districts 
residents and on the number who are eligible for enlistment. 

In the Goldberg and Greenston study, propensity and eligibility are influenced 
by various controllable and exogenous factors, which are grouped into broad 
categories: economic and demographic factors, recruiting resources, and policies. The 
economic factors include relative military pay, civilian unemployment, and GI Bill 
benefits. The demographic factors are the civilian male youth HSG and high school 
seniors population, racial mix, and urban/rural mix. The recruiting resources are 
recruiters of each service, and the recruiting policies considered are Air Force and 
Marine Corps changes in goals and standards. No additional consideration was given 
to problem of demand constraints. 

Results of active force supply studies are not directly applicable to the 
Reserves for several reasons. Among the reasons is the fact that the majority of 
reservists have a full time civilian job and participation in the reserves is a 
moonlighting decision. Another reason is the Active force recruits and operates in a 


national labor market. The Reserves, particularly the Army components, are forced to 
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operate in local labor markets. Across the U.S. the economic and demographic factors 
that affect enlistments vary considerably among local areas. 

A restriction to consideration at the local labor market level does not negate 
the importance of demand constraint in the Reserve supply modelling process. It is still 
plausible that in local areas potential supply may exceed recruiting quotas. A further 
complication is that the impact of quotas will be different across local labor markets 
because of the differences in the magnitudes of factors affecting potential supply. 

2. Reserve Supply 

There have been very few studies on Reserve enlistment supply since the 
introduction of the volunteer force. A few of them are discussed in the following 
paragraphs. 

Kelly, in 1979, estimated supply models for both NPS and PS personnel using 
total DoD accessions as his dependent variable and relative pay, unemployment and 
population as independent variables. This analysis is disaggregated to the state level 
and he derived relative wage elasticities of .35 for PS supply and .10 for NPS supply 
(Rete oes la 

As a part of a large study to investigate the impact of the all-voluntary-force 
on the Air Force Reserve, Rostker developed the moonlighting model. The model 
characterizes the choice to work as a tradeoff between the individual’s desire for 
income (from work) and leisure time [Ref. 10: p. 299]. In two subsequent studies, 
McNaught, reviews the work of Rostker and Kelly and points out a number of 
limitations and inconsistencies in their results. Combining those studies and the 
moonlighting model, McNaught [Ref. 11: p. 12] conceptualized a theoretical model of 


reserve supply where: 
R= fOW, C, SHU, P, Inia) 
R = measure of reserve participation 
W = Reserve wage 
C = civilian primary wage 


S = civilian secondary wage 


H = hours worked on primary job 
L = unemployment rate 
P = population of eligible enlistees 


I = stock of available information about Reserves 
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Ti travelcosts 


“ 
I 


a set of regional variables 


McNaught’s final model recognizes the significance of available information 
about Reserve enlistment opportunities as a determinant of Reserve participation. 
Because of data restrictions, the model which McNaught estimates is much more 
restrictive than his theoretical model. Specifically, he disaggregates his data to the 
state level and includes no measure of travel cost (which is not reimburse able), 
Reserve opportunity information (availability of better jobs), or recruiting goals. In his 
estimation McNaught concentrates on NPS enlistments and he looks at total DoD 
accessions without a separation by component. He estimates his model using logit 
analysis with the ratio of number of prior service accessions to qualified population as 
the dependent variable. This specification attempts to predict the probability of an 
individual with a given set of characteristics enlisting in the Reserves [Ref. 11: p. 36]. 

Borack et. al (1985) list four criticisms of McNaught’s study: 

(1) level of aggregation was too high 

(2) lack of measure of regional military interest 

(3) no consideration of the interaction between Reserve and Active recruiting svstems 
(4) no consideration of the effect of local recruiting goals (demand) on enlistment 


supply by geographical area. 
eect. 12: p. 36] 


Grissmer and Kirby in an effort to help fill the gap in research on Reserves, 
analyzed the attrition and reenlistment decisions of NPS, enlisted personnel in the 
Army Reserve and Army National Guard {Ref: 13: p. 130]. They point out that reserve 
participation resembles civilian moonlighting in some respects, but there are also some 
major differences: 

(1) Reservists are legally commited to their term of service; 

(2) All reservists must leave their primary job for at least two weeks annually to 
work full time on the Reserve job, and new nonprior service reservists must 
additionally train full time for at least four months; 

(3) Reservist drill a limited, specified number of hours and therefore do not have the 


option of working more to earn more; 
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(4) The Reserve job offers nonpecuniary benefits such as specialized training, as well 
as an environment that may generate camaraderie and a sense of team 
accomplishment, and finally; 

(5) Reservists receive other fringe benefits of military service such as educational 
benefits and exchange privileges (while on annual training). 

The reserve supply studies studies reviewed above are inconsistent and of 
limited use in estimating the effect of policy and demographic changes on potential 
supply. To improve the models the following considerations should be incorporated: 

Data should be analyzed at the lowest level possible (local Reserve Centers). 

The impact of recruitment goals and quotas should be included. 

Accessions Should be modelled by individual Reserve component. 

Cross effects of own and other service, Active and reserve recruiters should be included. 

Theoretical analysis of the Reserve participation decision and a review of 
previous military supply studies suggest that a useful model of Reserve supply should 
explain the number of Reserve component accessions as a function of the following 
explanatory variables: 

Economic Factors 

local unemplovment rates 

Reserve compensation 

Reserve benefits 

civilian primary job wages 

hours worked on primary job 

Demographic Factors 

age 

race 

education 

family incomes 

family sizes 

distances to Reserve Centers 

Recruitment Policies 

recruitment goals 

recruiters by component 

local military interest 


advertising effort 
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Restrictions on available data may limit the use of suggested variables in the 
subsequent analvsis. 

3. Cluster Studies 

Although classification is a fundamental step in the process of scientific 
studies, different sciences have different problems to solve. In addition, classification 
often contains the concepts necessary for the development of theories within a science. 

“Cluster analysis” is the generic name for a wide variety of procedures that can 
be used to create a classification. These procedures empirically form clusters or groups 
of highly similar entities. More specifically, a clustering method is a multivariate 
statistical procedure that starts with a data set containing information about a sample 
of entities and attempts to reorganize these entities into relatively homogeneous 
groups. [Ref. 14: p. 7] Clustering methods have been recognized throughout this 
century, but most of the literature on cluster analysis has been written only during the 
past two decades. Cluster analysis has taken many forms and is often defined in many, 
sometimes contradictory, ways. Literature on cluster analysis can be found in a variety 
of journals, ranging from electrical engineering to biology to library science to 
psychiatry. 

The major stimulus for the development of clustering methods was a book 
entitled Principles of Numerical Taxonomy, published in 1963 by two biologists, Robert 
Sokal and Peter Sneath. Sokal and Sneath argued that an efficient procedure for the 
generation of biological classifications would be to gather all possible data on a set of 
organisms of interest, estimate the degree of similarity among these organisms, and use 
a clustering method to place relatively similar organisms into the same groups. 
[Ref. 14: p. 9, citing Sokal and Sneath, 1963]. Once groups of similar organisms were 
found, the membership of each group could be analyzed to determine if they 
represented different biological species. In effect, Sokal and Sneath assumed that 
“pattern presented process”; that is, the patterns of observed differences and similarities 
among organisms could be used as a basis for understanding the evolutionary process. 
The literature on cluster analysis exploded after the publication of the Sokal and 
Sneath book. 

Solomon [Ref. 15: p. 37] lists three major avenues of approach in solving a 
clustering problem: 

(1) Total enumeration of all data partitions and the subsequent selection of a good 


or optimal clustering configuration; 
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(2) A stepwise clustering scheme that selects for each number of clusters the best 
available groupings with the realization that it may ignore some good 
configurations in the process; 

(3) Reduction of multivariate data to two or three orthogonal dimensions, 
producing a graphic or pictureal representation that permits visual clustering. 

An essential step in any of these approaches is representation of the data and 
establishment of measures of similarity. Since the choice of the variables to be studied, 
their interrelationships and the measures of similarity are the basis for any clustering 
scheme, much consideration must be given to ensure that “closeness” in the sense of 
the similarity measures indicates closeness in the sense of the objectives of a study. The 
simplest and most common measures of similarity are those which combine the effects 
of individual variables into a single number. This assumption of numerical 
comparability allows clustering processes that group objects by overall similarity. Ball 
[Ref. 16: p. 17] lists five types of similarity measures: 

(1) Association: The similarity between object X and object Y is the number or a 
function of the number of variables for which X and Y have the same response; 

(2) Correlation: Correlation between object X and object Y is a function of the 
angle between their respective vectors; it is most useful when a pattern of ratios 
of the variables is the prime determinant of similarity; 

(3) Distance: Many different distance measures are available. Weightings can be 
applied to absolute or euclidean distances and can be derived either from an a 
priori evaluation of each variable’s importance or from the data. Euclidean 
distances were emphasized by Ball; 

(4) Probabilistic: These measures are used primarily when it is appropriate to modify 
weights of the variables on the basis of population statistics; 

(5) Functional: For functional measures, the value of similarity is a function of the 
distance from other objects. 

When measures of similarity between objects have been established, the 
measures must be modified to provide meaningful similarity between groups of objects 
and between objects and groups. 

Alexander, in 1974, examined the relationships between the structure of 
internal labor markets and the mobility, experience and income of workers. [Ref. 17: p. 
64]. In order to examine the relationships between structure and variables, he realized 


a measure was required that would allow him to classify internal labor markets. Most 


of the previous research in this area has been based on case studies in which an 
industry has been intensively analyzed and subjectively classified. One of the goals of 
his work was to develop classification criteria of structures, based on objective and 
comprehensive data, that were consistent with the results of case studies. Internal 
labor markets were classified according to many different schemes, but he utilized 
Kerr's taxonomy of open, guild, and manorial markets. The open market is the 
unstructured, competitive type. Guild-type markets are stratified horizontally. And 
manorial markets emphasize attachment to the place of work and Vertical stratification. 
Alexander concluded that segmentation does exist because of institutional 
characteristics. 

Milligan, in 1980, conducted an evaluation of several clustering methods. 
[Ref. 18: p. 325] He acknowledged that a general definition of cluster structure was 
unlikely, but he offered one which involves two parts. Essentially, clusters should 
exhibit the properties of external isolation and internal cohesion. External isolation 
requires that entities in one cluster should be separated from entities in another cluster 
by fairly empty areas of space. Internal cohesion requires that entities within the same 
cluster should be similar to each other, at least within the local metric. This definition 
is similar to the concept of natural clusters. 

To evaluate 15 clustering methods, he created a data set which would 
naturally cluster. Then in conjunction with the clustering methods, he added, one at a 


time, six different error perturbations. These were: 
(1) Error-free parent data sets. 
(2) Data sets with outliers. 
(3) Error perturbation of the distances. 
(4) Addition of random noise dimensions. 
(5) Computation of distances with noneuclidean index. 


(6) Standardization of the variables. 

The simulated data sets were clustered by eleven agglomerative hierarchical 
algorithms and four nonhierarchica! centroid sorting procedures. The methods are 
listed in Table 3. The last four methods are nonhierarchical (k-means) centroid sorting 
procedures which produce only a single partition. 

The set of methods was chosen primarily for three reasons. First, program 


listings for the methods are generally available and can be adopted for many types of 
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TABLE 3 
HIERARCHICAL METHODS AND K-MEANS ALGORITHMS 


ERROR HIGH 


METHOD PREE ERROR 
Single Link 974 ri 
Complete Link 995 880 
Group Average ee 948 
Weighted Average 994 934 
Centroid Method O53 810 
Median Method 976 831 
Ward’s Minimum Variance 987 .940 
Beta-Flexible 997 945 
Average Link in the New Cluster 985 906 
Minimum Total Sum of Squares S35 O05 
Minimum Average Sum of Squares 993 919 
MacQueen’s Method 884 842 
Forgy’s Method 992 872 
Jancey's Method 28 909 
Convergent K-means .903 oe 


Source: Milligan, G 

An Examination of the E le ect of Six ae of Error 
Perturbation on afteen ene Algorithms 
Psychometrika-vol. 45, no. 3. September, 1980, p. 332. 


clustering problems. Secondly, the methods are all fairly fast in terms of CPU time and 
are economical for most applications. Further, some of the methods have been adapted 
to handle very large data sets. He concluded that the results indicated the hierarchical 
methods were differently sensitive to the type of error perturbation. Also, he indicated 
that the simulation results were promising and a more detailed study of this and other 
such indices should be undertaken. 

Hodson in 1983, [Ref. 19: p. 25] employed a rigorous approach to defining 
market sectors. He began with data on 40 characteristics of firms or industries, 


encompassing firm size, productivity, unionization and various market measures, such 


ae 


as government regulation and foreign involvement. Principle components analvsis was 
used to reduce the 40 variables to 25 factors. He then applied cluster analysis with the 
factor scores from 202 industnes to form 16 industry clusters. He collapsed the 16 
clusters into six industry groups to facilitate an empirical analysis of earnings. Rather 
than use a clustering algorithm, the final grouping was based on the authors 
judgement. He criticized previous work on dual labor markets for relying on only two 
groups. Hodson found that industry group affected earnings, even when worker 
characteristics were held constant. His findings were inconsistent with other work by 
sociologists and raised many questions on labor market theory. 

As demonstrated in the review of cluster analysis there is a diversitv of 
disciplines contributing to the literature. There is also a variety of methods lumped 
under the term cluster analysis. This thesis will pursue methods relevant to the 


Reserves, incorporating Known econometric techniques and cluster analysis. 


Il. DATA/METHODOLOGY 


Ages 

The primary research question is to identify specific social and demographic 
factors among local geographic areas that can provide a basis for classifying Army 
Reserve Center markets into unique and homogeneous groups. Classifying 
characteristics are social and demographic factors related to the local labor market and 
recruiting success measures attributable to units attached to the reserve center. 

To conduct this study, data were extracted from the mass storage volume group 
at the U.S. Army Recruiting Command (USAREC). The file contains reserve 
accession counts and other accession variables for use in analysis. Another data file 
composed of demographic and local labor market factors has been created at the Naval 
Postgraduate School. This file was merged with the USAREC file to match accessions 
with local market data. The merger gave a final file which contained 967 records. 

[ach record contains accession counts, occupation and industry counts, black 
population percentages, unemployment figures, income, family size, rental, home value 
information, recruiter counts, authorization data, military available figures, member 
and unit strength data, and wage data. Unemployment, wage figures, and accession 
counts are from 1985. This matched file will be the basis for the similarity analysis 
conducted in this study. Table 4 identifies summary statistics for nonprior service and 
prior service male and female reserve accessions. Later, in an effort to reducemtme 
sample size, a random sample within the range identified will be utilized. Accessions 


for all reserve components totaled 306,108. Observations or cases are Reserve Centers. 


For reading convenience, a sample of the range in variable values (minimum and 
maximum values) are shown in Table 5. A further description of these and other 
variables are shown in Appendix A-C. Each market 1s defined by local factors within a 
35 mile radius. Because the reserve markets are defined geographically by distance, 
there tends to be a wide disparity in characteristics. The file contains data on reserve 
services other than Army, but this analysis will concentrate on a subset of data 
representing accessions to the Army Reserves. Local labor market conditions are more 
likely to have a greater impact on Army Reserve Supply due to the large number of 
USAR Reserve Centers. 
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TABLE 4 
1985 ACCESSION DATA 


ANPSAA ARMY NONPRIOR SERVICE MALE & FEMALE ALL AGES 


ANPSAA Zoe, 2 OO. 0 
MEAN Zo eel =D es C30! 
MINIMUM 0 MAXIMUM PoOS. C 


| BE SAA ARMY PRIOR SERVICE MALE & FEMALE ALL AGES 


APSAA 74858.0 

MEAN 77.4 > DD DEW ae Oho 
MINIMUM 0 MAXIMUM SoZ .0 
VALID OBSERVATIONS = 967 MISSING OBSERVATIONS = O 





B. METHODOLOGY 

A focus of this thesis is to identify local labor market variables useful in grouping 
relatively homogeneous groups of Reserve Centers. The cluster analysis involves 
multivariate statistical procedures. The heart of any multivariate analysis consists of 
the data matrix (Table 6). This matrix is a table that gives a number of observations 
on a number of variables simultaneously. For this study, observations are Reserve 
Centers and variables are local labor market characteristics which effect supply and 
characteristics of the Reserve Center. The following is a discussion of cluster analysis 
methodology. 

Clustering methods are used to discover structure in data that is not readily 
apparent by visual inspection or by appeal to other authority. The analysis is a two 
stage process. The first stage is to choose quantifiable attributes that describe the 
objects, and then use these attributes to measure the pair-wise dissimilarity among the 
objects. The second stage is to represent these dissimilarities by an appropriate 
classifying system or display. 

The input to cluster analysis is normally an n x p matrix of data. Measurements 


of p attributes for each of n objects. In this case it will be measurements of variable 
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TABLE 5 
LABOR MARKET SUMMARY DATA 


Variable Minimum Maximum 
NPS Male 0 1, oe 
QMA Male 1 11,384 
7, Of Porpratac O 100 
% Wt M unemp 0 Is) 
% Blk M unemp 0 ie 
Ave Fam incm 9,849 30, ze 
Ave Family size 3 2 
Med Home Val 1S 55 6 12/,96s 
Med Home Rnt 98 368 
% fam Z2 Wrkrs 0 LO 

| %Pop Chg70-80 O 100 

7% N\Krs by industay 
Manufacturing 0 100 
Service 0 100 
Government O 100 
seasonal O 100 


characteristics for each Reserve Center local area. The output from cluster analvsis 


{Ref. 20: p. 47] is normally one of three displays: 
A hierarchical classification, commonly called a tree diagram or dendrogram; 


A partition of the objects into mutually exclusive sets, each set described by a 


profile or vector of p attribute values; 


A clumping of objects into sets that may overlap, each set again described by a 
profile. 

In particular, the output should highlight mutual interaction among three 

variables or more, just as easily as one can highlight a two way interaction. The value 


of these outputs is that they summarize the original data objectively and they tend to 
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WAG SBIE Ne 
ILLUSTRATIVE DATA MATRIX 


Variables 


Observations af 2 Sache. ec eee: p 


X11 X12 x13 X1j # Xip 


2 KZ X22 X23 Dez X2p 
a Xil AZ X13 X14 Xip 
n hn Xn2 Xn3 Xn4 iS 


highlight subtle interactions in the original data, enabling a user to formulate 
reasonable hypotheses about the interactions. 

Things that are recognized as similar or dissimilar are fundamental to the process 
of classification. [Ref: 21: p. 13] Despite its apparent simplicity, the concept of 
simularity, and especially the procedures used to measure similarity, are far from simple. 
Similarity does not lie with the simple recognition that things are either alike or not 
alike, but instead in the ways in which these concepts are expressed and implemented 
in scientific research. To be successful, research has to be based upon objective 
procedures. Cluster analysis is a result of this necessity. 

Often the term “similarity coefficient” (or measure) is used to describe any type of 
similarity measure. Sneath and Sokal (1973), subdivided these coefficients into four 
groups: 


(1) correlation coefficients, 

(2) distance measures, 

(3) association coefficients, and 

(4) probabilistic similarity measures. 
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For this analysis, distance measures will be used. 

The quantitative estimation of similarity has been dominated by the concept of 
metrics. Any nonnegative real valued function d(x,y) can be used to judge whether a 
similarity measure is a true distance function (or metric). 

(1) Symmetry. Given two entities, x and y, the distance, d, between them satisfies 
the expression 
dy) °aag) 
(2) Triangle inequality. Given three entities, x,y,z, the distances between them 
satisfies the expression 
d(x,y) S d(x,y) + d(y,z) 


This simply states that the length of any side of a triangle is equal to or less 
than the sum of the other two sides. This concept has also been called the 
metric inequality. 
(3) Distinguishability of nonidenticals. Given two entities x and y, 
if d(x,y) + 0, then x # y 
(4) Indistinguishability of identicals. For two identical elements, x and x’ 
d(x,x’) = 0 


The distance between the two entities 1s Zero. 

Because of their intuitive appeal, distance measures have enjoyed widespread 
popularity. Technically, they are best described as dissimilarity measures; most of the 
more popular coefficients demonstrate similarity by high values within their ranges, but 
distance measures are scaled in the reverse. Two cases are identical if each one 1s 
described by variables with the same magnitudes. In this case, the distance between 
them is zero. Distance measures normally have no upper bounds, and are scale 
dependent. The most commonly used distance is the Euclidean distance. It is defined 


as: 


distance(x,v) = SQRT(sum(xi-yi))? 

The potential user of cluster analysis should be aware that many types of 
similarity exist, and that while many of the coefficients and measures commonly used 
in quantitative approaches to classification are metrics, there are alternatives to the use 
of these measures that may be appropriate and necessary within the context of 


research. Choosing a distance function is no less important than the choice of 
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variables to be used in the study. The choice of similarity measure should be 
embedded ultimately within the design of research, which 1s itself determined by the 
theoretical, practical, and philosophical context of the classification problem. A 
Euclidean distance measure will be used in this study. 

Variable selection to be used with cluster analysis is one of the critical steps in 
the research process. Ideally, variables should be chosen within the context of an 
explicitly stated theory that is used to support the classification. The theory is the 
basis for the rational choice of variables to be used in the study. Traditional theories of 
labor market participation and the military manpower supply research previously 
undertaken provide a starting point for identification of variables. From the literature 
review, various economic, demographic, and recruitment variables are listed. Table 5 
lists several candidate variables which could be used in cluster analvsis. These 
variables include; average family income, percent black population, and civilian jobs in 
the area. 

It will be appropriate to standardize all of the variables used in this cluster 
analysis. In most statistical analysis, the data are routinely standardized bv some 
appropriate method. If the normality of a variable is in question, a logarithmic or other 
transformation is often performed. If the data are not of the same scale values, they 
are commonly standardized to a mean of O and to unit variance. There is some 
controversy as to whether standardization should be a routine procedure in cluster 
analysis. Most of the literature argues convincingly that standardization is 
inappropriate when the difference in scale between two variables mav be intrinsic; but 
no intrinsic differences seemed likely in the candidate variables used here. Users with 
substantially different units of measurement will undoubtedly want to standardize 
them, especially if a similarity such as Euclidean distance is to be used. The decision to 
standardize should be made on a problem to problem basis, and users should be aware 
that results differ solely on the basis of this factor, although the magnitude of the effect 
will vary from data set to data set. Using unstandardized Euclidean Distance in the 
current situation would clearly result in the dissimilarity coefficient being driven by 
median home value, and average family income while variables such as average family 
size would be ignored. Standardization also puts all the variables in comparable units. 
Each variable used in this analysis will be transformed to a Z-score variable. The Z- 


score Variable transformation standardizes variables with different observed scales to 


the same scale. 
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Other types of data transformation are possible, and many of these have been 
used concurrently with cluster analysis. Factor analysis or principle components 
analysis is often used when a researcher knows that the variables in the study are 
highly correlated. The uncritical use of highly correlated variables to compute a 
measure of similaritv is essentially an implicit weighting of these variables. That is, if 
three highly correlated variables are used the effect is the same as using only one 
variable that a weight three times greater than any other variable. 

The data file used in this analysis contains at least 40 variables. Value listings are 
included in Appendix D. To make efficient use of the candidate variables, a factor 
analysis was run on all the variables at once. This analysis will utilize one 
representative variable from the various groups. Also, the number of Reserve Centers 
(967) in the data base will be scaled down to explore clusters associated with high 
accessions, low accessions, fill rates, take, and relative take (compared with National 
Guard accessions). At the same time, this will have the effect of reducing the data set 
to a more manageable size. 

The SPSSX information analysis system is a comprehensive tool for managing, 
analyzing, and displaying information. Its capabilities include hierarchical and 
nonhierarchical techniques. Hierarchical agglomerative methods have been dominant 
among the seven families of methods in terms of frequency of their applied use. In the 
agglomerative methods, you begin with N clusters; 1.e., each observation constitutes its 
own cluster. In successive steps the two closest clusters are combined, thus reducing 
the number of clusters by one in each step. 

The K-means clustering 1s a popular nonhierarchical clustering technique. For a 
specified number of clusters K the basic algorithm proceeds in the following steps: 

(1) Divide the data into K initial clusters. The members of these clusters may be 
specified by the user or may be selected by the program, according to a 
predetermined procedure; 

(2) Calculate the means or centroids of each of the K clusters; 

(3) For a given case, calculate its distance to each centroid. If the case is closest to 
the centroid of its own cluster, leave it in that cluster; otherwise, reassign it to 
the cluster whose centroid is closest to it; 

(4) Repeat step 3 for each case; 


(5) Repeat steps 2, 3, and 4 until no cases are reassigned; 
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For this analysis, the K-means clustering (nonhterarchical) technique will be used. This 
is chosen, in part, to handle a large number of cases and a specified number of clusters 
will be requested. In addition, hierarchical clustering will be used on a smaller sample 
of the data set. 

Four factors appear to influence greatly the performance of clustering methods: 


(1) elements of cluster structure, 

(2) the presence of outliers and the degree of coverage required, 
(3) the degree of cluster overlap, and 

(4) choice of similarity measure. 


beet. 22: p. 23). 

To review the cluster methodology, a considered first step is to selectively reduce 
the size of the data file. What results is a set of variables relevant to the reserves with 
local characteristics. Next, is a choice of dissimilarity coefficients, and finally the 


choice of a clustering algorithm. 
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II-CEUSTER RESULTS 


A. CLASSIFICATION 

If manpower supply researchers were to try to classify Army Reserve 
markets,they would probably link demographic, economic and recruiter factors in the 
decision. These measurable factors would then be used to form a mathematical 
equation to predict and classify such things as accession counts, fill rates, and relative 
accessions. 

Cluster analysis has more potential as a factor in classification transformation. 
First of all, the ability to group two Reserve Centers together is intrinsic to every 
clustering algorithm (so long as the complete link-furthest neighbor sorting strategy is 
not used). Secondly, cluster analysis requires the user to define only a transformation 
from measurable factors to a pair-wise dissimilarity coefficient rather than a 


transformation from measurable factors to dependent measures. 


B. ACCESSION COUNTS 
1. Nonhierarchical 
To demonstrate this application of cluster analysis, the following local area 
factors were selected (utilizing theories from previous researchers) with which to 
objectively classify Reserve Center markets into homogeneous groups bv accession 
counts: 


. percentage manufacturing industry 


tl 


. average family size 


. primary male military available 17-21 years old 


LE vos 


. unemployment - black male 


. population change 1970-1980 


ON WG 


. Mean civilian wages 


—~) 


. military installations count. 

Sample values of these and other variables are listed in Appendix E. Using 
the variables above, the data set was converted into clusters. Cluster results with 
measurement characteristics are summarized in Table 7 . 

The Reserve Center local market data have been clustered with the K-means 
procedure of Quick Cluster in SPSSX. This method demonstrates the basic features of 


oe 





TABLE 7 
DOGO ONeCLUSTER RESULTS 


Pili ChUSteR CENTERS 


CLUSTER ZUNEMPB ZAVGEAMS ZMANUFAC Z2WAGES 
it med Z TOS oO © oO 
Zs mr SZ Za OO =) OO -.577 
3 -. 202 -. 205 sOrS -.154 

Civgistine 2EOPCHNG ZMILINS ZPMILAVA 

il = 920 eee) 2 1. 704 
zZ ie fie. =. 505 -.407 
S . 140 -. 346 -.329 


MEAN NPS 


CLUSTER CASES ACCESSIONS 
t 146 S25 
Z 48 aS 
5 742 ae 
MISSING on 
TOTAL eo 


NUMBER OF CASES IN EACH CLUSTER. 





nonhierarchical clustering methods. In the first step, preliminary calculations are 
made, such as the variable means and standard deviations. Then, an initial partition of 
the data is obtained with an internally generated starting partition, assigning the 967 
market areas into three clusters (K = 3). The next step forms the initial cluster centers. 
Each of the other observations is assigned to the nearest cluster. Euclidean distance 1s 
used for this initial phase, and the cluster centroids are recomputed after each 
observation is assigned to a group. 

After the initial solution has been found, the program advances to the iterative 
K-means phase. The distance from each observation to each cluster centroid is again 
computed, using the Euclidean distance criterion, and the assignment to the closest 
centroid is made and the centroid updated to reflect its new membership. After 
considering all observations in this manner, the new criterion value is checked for 
possible improvement during the K-means iteration. As long as the criterion value 
improves, the K-means procedure is repeated until final cluster centers are found. The 
final cluster centers in Table 7 result from the variable means for the cases in the final 


clusters. 


Classification of Reserve Centers by local factors should depend primarily on 
pair-wise data. The factor data were standardized into Z-values to get a meaningful 
data set. Appendix E lists the variable transformations, and Appendix F lists the Z- 
values. 

As shown in Table 7, the three cluster iteration results in separate clusters of 
146, 48, and 742 local market areas. Table 7 also shows the mean nonpnior service 
accession counts ranging from 93 to 853 for each of the clusters. 

This suggests that clusters according to accession counts can be classified as: 

* 146 high accession market areas (cluster 1); 
* 742 medium accession market areas (cluster 3) and; 


* 48 low market areas (cluster 2). 


There appears to be a significant difference in the average accessions from high market 
areas, as opposed to accessions from the medium or low areas. A further investigation 
of cluster membership reveals that for the high accession market areas, none of the 146 
cases have accessions lower than 133, and the highest accession count for this cluster 1s 
1903. Clusters two and three are not as distinct in grouping median and lower clusters, 
in that the range is from 6 to 361 for cluster 2, and 0 to 730 for clustermmeac. 
Appendix G for cluster statistics). This could indicate the existence of outliers in each 
of the clusters. 

A natural question to ask after observing the results of a cluster analysis is 
what variables most strongly influence the clustering observed. A clue could be 
provided by a look at the mean and standard deviations of the cluster member 
variables. Table 8 shows that cluster 1 is distinguished from the other clusters with 
high average values in primary military available, mean wages, and military 
installations count. And a low value for population change. Primary military available 
is almost ten times the average of the other clusters, as well as Military installations 
count. It should be noted that these values also correspond (relatively) to values shown 
for final cluster centers in Table 7. 

One major problem shared by all iterative methods is the problem of 
suboptimal solutions. Since these methods can sample only a very small proportion of 
all possible partitions of a data set, there is some possibility that a suboptimal partition 
may be chosen. Unfortunately, there is really no objective way to determine if a 
solution from an iterative partitioning method is globally optimal. One avenue of 


solution to the problem, however, is to use the clustering method in conjunction with 
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an appropriate validation procedure. One validation procedure could be the use of 
regression analysis. 
2. Regression 

Multiple regression, the use of many independent variables to predict a 
dependent variable, is probably the statistical technique used and understood most 
often bv managers. An attempt will be made here to develop a predicting equation for 
the clustered nonprior service accession counts using the previous seven variables (as 
independent or explanatory variables). Nonprior service accessions are used as the 
dependent variable. 

Results obtained from estimating the multiple regression model of nonprior 
service accessions for all of the Reserve Center markets are shown in Table 9. All of 
the variables are statistically significant at the 5% level. The variables, primary military 
available, population change, mean wages, and military installations count are 
significant at the 1% level. This equation would suggest, in simple terms, that high 
accessions would be found in market areas where the percentage manufacturing 
industry, average family size, primary mulitarv available, mean wages, and military 
installations count are relatively high. And where black male unemployment and 
population change are low. 

Accession equations were estimated for each of the three clusters and shown 


in Tables 10, 11, and 12. These results indicate that high accession market areas 
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(cluster 1) are significantly influenced by the variables primary military available, mean 
wages, population change, and military installations count. Percentage manufacturing 
industry, unemployment black male, and average family size are not significant for this 
cluster. In addition, the sign of the coefficient for average family size changed from 
positive to negative. 

Medium accession market areas (cluster 3) are influenced significantly by 
primarv military available, average family size, black male unemployment, population 
change, and military installations count. Percentage manufacturing industry is not 
Statistically significant and mean wages 1s not significant at the 5% level. For this 
cluster, signs changed from positive to negative on coefficients for variables of 
percentage manufacturing industry mean wages, and military installations count. This 
would imply that medium accession market areas would cluster where these factors are 
low. 

Table 11 shows that medium to low accession market areas (cluster 2) are 
influenced significantly (1% level) by primary mulitary available alone. Military 
installation counts and black male unemployment are significant at the 10% level. For 
this cluster, population change, mean wages, percentage manufacturing industry, and 
average family size do not significantly influence the equation. Results for this equation 


may be affected by a small sample (48). 
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Dependent variable-Nonprior Service Accession Counts 
N = 48 
R SQUARE = .741 
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TABEE 12 
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3. Hierarchical 

In this application of cluster analysis, the process yields a hierarchy of cluster 
solutions, ranging from one overall cluster to as many clusters as there are cases. For 
this reason the file had to be reduced. Clusters at a higher level can contain several 
lower-level clusters, but within each level, the clusters are disjoint (each item belongs to 
only one cluster), 

This example was constructed from a random sample of 40 Reserve Center 
local market areas out of 967. The same measurable factors (variables) used in the 
previous section are used to cluster the markets. Results should be similar but not the 
same. 

Table 13 shows a list of results for the 3 cluster solution along with the actual 
accessions for each case. In the 3 cluster solution, cluster membership 1s as follows: 

cluster 1: 29 market areas 158 average accessions 

cluster 2: 8 market areas 1095 average accessions 

cluster 3: 3 market areas 130 average accessions 
Again these results suggest that local market areas according to accession counts can 
be classified as: 

* high accession market areas (cluster 2); 

* medium accession market areas (cluster 3) and; 


* medium to low market areas (cluster 1). 
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Another output of hierarchical cluster analysis is the dendrogram. The key to 
reading a dendrogram is the concept of cluster level. By specifying a cluster level, the 
following information can be read from a dendrogram: the number of clusters and the 


Reserve Center markets contained in each cluster. That is, there is a correspondence 


Bo 


from cluster level to a partition of the Reserve Centers. Figure 3.1 is a display in 
graphic format (dendrogram) of the 40 market areas that were involved in the 


hierarchical clustering. 


DENDROGRAM USING AVERAGE LINKAGE (BETWEEN GROUPS) 
RESCALED DISTANCE CLUSTER COMBINE 
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Figure 3.1 Accession Dendrogram. 


The scale at the top of the dendrogram 1s a cluster level scale. Note that the 


minimum value of cluster level is 25 at the far right. This corresponds with the highest 
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similarity coefficient (see Appendix I). A low cluster level specifies a partition having 
many small clusters while a high cluster level specifies a partition having a few large 
clusters. Thus, cluster level can be thought of as a measure of the largest dissimilarity 
(or, equivalently the weakest bond) present within any cluster in the partition. 

An extremely useful part of the output is the visibility of data on markets. 
forecxaimpie, consider cluster level 0, the minimum observed cluster level in the 
dendrogram. At cluster level 0, the 40 Reserve Center markets are partitioned into 32 
clusters. Twenty-eight of these clusters contain only a single Reserve Center. Two of 
the 32 contain exactly 2 Reserve Centers and two clusters contain 4 Centers; identified 
by zipcodes 19090, 19401, 19013, 19007, and 16602, 16652, 18702, 43326. Since 0 is the 
minimum observed cluster level, we may conclude that the strongest possible bonds 
exist within every cluster. Specifically, we may conclude that the Reserve Center 
markets mentioned above are bound together by the tightest possible market ties. The 
cluster analysis will not separate them even at the lowest cluster level. 

Consider next a slightly higher cluster level, say 11. Here we are permitting 
slightly weaker bonds to be present within clusters. We find that the 40 Reserve 
Centers are now partitioned into 5 clusters. One of these clusters contain a single 
Reservewcenter |12s4ieand one contains 3 Reserve Centers. In the 7 Reserve Center 
cluster 19090, 19401, 19013, and 19007 have been joined by 91105, 94165, and 98199. 
It may be concluded that slightly weaker ties bind the new Reserve Centers to the 
original five. Similar inferences can can be drawn from other dendrograms, using 
different measurable variables. 

[t is not until level 17 that a 3 cluster solution is apparent. The dendrogram 
provides visibility to the broad scope of bonds that bring Reserve Center markets 
together. From Table 14 one could deduce that it is the high mean values of primary 
military available and military installations counts, along with low population change 
which causes markets 19090, 19401, 19013, and 19007 to join initially and form cluster 
2. Market areas 83440, 84062, and 88001 cluster at level 10 and do not allow others to 
join until the final one cluster solution. This may be cause for further investigation of 


these markets. 


© FILU RATE 
1. Nonhierarchical 
This section analyzes fill rate, defined as the number assigned divided by the 


number authorized (assigned/authorized) for each Reserve Center market. Fill rates are 
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important indicators of unit readiness levels. To demonstrate the application of cluster 
analysis for fill rate, a three cluster analysis for the total sample (967) was run. This is 
a departure from the previous method because the lone measurable factor influencing 


the cluster result is fill rate. Table 15 shows the results of this 3 cluster analysis. 


TABE ESS 
FILL RATE CLUS@OIGReS UL 


FINAL CLUSTER CENTERS. 


NUMBER FILLER STD 
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The mean fill rates in the three cluster solution suggests the expected fill rate 
classification of: 
* 8 high fill rate markets (cluster 1); 
* 709 medium fill rate markets (cluster 3) and; 


* 247 low fill rate markets (cluster 2). 


2. Regression 
To develop a predicting equation for fill rates variables in the data file were 
used to estimate a regression model for explaining variations in Reserve Center fill 
rates. Final variables found to be statistically significant and used in the model are: 
unemployment-black males; 
population change; 
authorized billets and; 


number of USA (active) recruiters. 


The regression model is shown in Table 16. 











TABLE 16 
FILL RATE REGRESSION MODEL 

PROB 
EXPLANATORY VARIABLES CODE t VALUE 
NUMBER OF USA eons oe = 16 4a 8 O01 
Boece ae a we ae ee 2S iO © 7 
EOPULAW. 1970" =i SOM sc sue . 14 TO ee 0 1 
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Meee ee RY go es = os bie se wi se ys 3.46 eS 2! 
Dependent Variable=FILL RATE | 
N = 967 | 


R SQUARE = .4 
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All of the variables used are significant at the 1% level except for black male- 
unemployment, which is significant at the 10% level. Results of this model suggests 
that high fill-rates can be explained by relatively high unemployment black males, a 
relatively high number of active recruiters, and positive population growth. A relatively 
low number of authorizations would be associated with high fill-rates. 

A separate regression of fill rates for medium and low fill rate market areas 
was calculated to see if the nonhierarchical clusters using only fill rate as a measurable 
variable could be explained by the independent regression variables. The high fill rate 
market was not analyzed due to a small sample (8) size. The results are shown in 
Tables 17 and 18. 
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LOW FILL RATE (CLUSTER 2) REGRESSION MODEL 
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Dependent Variable=FILL RATE 
N = 247 
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TABLE 18 
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R SQUARE = .61 
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These results are not encouraging as a predicting model for market fill rates. 


oO 
| UNEMPLOYMENT=-BLACK MALE ..3 952. ..... ze 
| 3 


Except for black male unemployment, which is significant at the 10% level, variables in 
the low fill rate (cluster 2) regression model are not individually statistically significant 
in explaining low fill rate markets. The cluster 3 regression model shows some promise 
of being a good predicter, because authorized billets, population change, and number 
of USA active recruiters are significant at the 1% level. Unemployment-black males 1s 


not statistically significant for cluster 3. 
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A nonhierarchical cluster analysis using the four measurable variables for 
predicting all fill rates yields the results shown in Table 19. The results have a much 
closer range between means for low markets and medium fill rate markets (.02), which 
would suggest there is not much difference between the two clusters. Also the high fill 
rate cluster (cluster 3) has a large number of market areas (N=91) compared to the 
previous straight fill-rate cluster where N=8. The disparity is likely caused by the lack 


of significance in explaining individual clusters. 





TABLE 19 
NOmrteimARCHICAL CLUSTER CHARACTERISTICS 
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3. Hierarchical 

To better understand the interactions of fill rate clusters, a hierarchical cluster 
using the reduced sample of 40 markets and fill rate as the lone measurable variable is 
presented. These results are in Table 20. 

Table 20 sheds some light on why there is a lack of explanation for fill rate 
clusters. Cluster 2 contains only one market area, identified by zipcode 32347, and 
cluster 3 contains only two market areas, identified by zip codes 94965 and 43326. 
Although the mean fill rates for the three clusters can be classified as; 

high (cluster 2 mean= 2.04), 
medium (cluster | mean=.97), 


low (cluster 3 mean= .42). 


A large portion (37) of the fill rate markets fall into the same cluster, indicating there 1s 
no real distinguishable dissimilarity between fill rates in the markets. A clearer picture 


can be viewed with the help of the dendrogram shown in Figure 3.2. 
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TABEE 20 
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The dendrogram shows that on the lowest level, the 37 market areas 
immediately cluster together with market areas 43326, 94965, and 32347 staying alone 
as single market clusters. It is not until the final level, when the clustering algorithm 
forms a single cluster, that market area 32347 joins another market. This is a clear 
indication that this particular market is an outlier and should be disregarded from 


analysis. 
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DENDROGRAM USING AVERAGE LINKAGE (BETWEEN GROUPS) 
RESCALED DISTANCE CLUSTER COMBINE 
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Figure 3.2 Fill Rate Dendrogram. 


D. COMPETITIVE SUCCESS 
1. Nonhierarchical 
To demonstrate this application of cluster analysis for competitive success, a 


three cluster analysis for the total sample (967) was run. Competitive success is defined 
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as the number of Army Reserve accessions divided by the number of National Guard 
Accessions. The definition of competitive success naturally underscores the fact that 
successful Army Reserve markets will have high values for the success variable. If the 
three cluster solution clusters based on high low and medium success, one could infer a 
classification of clusters based on relative success. Results of the clustering are shown 
imp alelcee 
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The mean success values in the three cluster solution suggests the expected 
success classifications of high, low, and medium, but, cluster 3 contains only one 
market area out of 959 (8 cases are missing). It 1s apparent that cluster 3 contains an 
outlier market which will not be further considered in this analvsis. This leaves two 
clusters to consider for classification (cluster | and cluster 2). The means of the two 
clusters lends itself to the following classification: 

* high competitive success markets (cluster 2) 


* moderate competitive success markets (cluster 1) 


2. Regression 
Predicting equations for high competitive success markets and for moderate 

competitive success markets were developed Again, variables from the data file were 
used to find a significant model for each of the classifications. Table 22 shows the 
regression model results for predicting high competitive success markets. The following 
variables were used because of their statistical significance: 

percentage manufacturing industry; 

median rent; 

average family size; and 


military installations count. 
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The variables median rent and military installations count are statistically 
significant at the 5% level. While percentage manufacturing industry is significant at 
the 1% level and average family size 1s significant at the 10% level. The coefficients of 
each of the variable are negative which suggests high success rates correspond with or 
can be explained bv low or negative values for each variable. 

Table 23 shows the variables used to develop a moderate competitive success 
model: 

percentage government industry; 
mean wages; 
population change; 


number of USAR recruiters. 


As shown in Table 23, all of the variables used to explain moderate 
competitive success are statistically significant at the 1% level. These markets, based 
on the model, are likely to be located where the percentage of government industrv 1s 
low, mean wages are high, population change is negative or low, and the number of 


Army Reserve recruiters 1s high. 
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3. Hierarchical 

Table 24 shows a hierarchical cluster result using competitive success as the 
lone measurable variable. The file was scaled down from 967 markets to a random 
sample of 40 Reserve Center markets. 

Cluster 3 contains only two market areas, identified by zip codes 77701, and 
61614. Cluster 2 contains nine market areas and cluster | has 29 market areas. It is 
clear that cluster 1 dominates this three cluster algorithm. Cluster 3, although 
associated with high success values appear to be outhers. [he mean success value for 
cluster 3 1s 6.36, the mean for cluster 2 is 5.9, and the mean for cluster 1 is 2.75. If 
cluster 3 is an outlier, these results correspond with the results obtained earlier using 
the nonhierarchical method. Cluster 2 would be classified as high success markets and 
cluster | would be classified as moderate success markets. Cluster 3 would be dropped 
from analvsis. A clearer picture can be viewed with the help of the dendrogram shown 
In’ Figiiresore. 

The dendrogram shows that on the lowest level, the 29 market areas of cluster 
1 immediately cluster together. Market areas 61614 and 77701 join at level two and 
remain away from the others until the final level forms a single cluster. This is further 


evidence that these two markets are outliers. 
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TABLE 24 
HIERARCHICAL COMPETITIVE SUCCESS CLUSTERS 
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E. OVERVIEW 

The application of a particular clustering scheme to a particular set of data 
involves assumptions about the appropriateness of the statistical and mathematical 
techniques employed in the scheme. These assumptions are often difficult to justify 


and the researcher must rely to some extent on intuition and experience with the 
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DENDROGRAM USING AVERAGE LINKAGE (BETWEEN GROUPS) 
RESCALED DISTANCE CLUSTER COMBINE 


Figure 3.3 Competitive Success Dendrogram. 


markets as high fill rates and/or high competitive success. 
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characteristics of the objects under consideration. It would be unwise to accept these 
results uncritically It is possible that other approaches to cluster analysis would be 
more appropriate or yield better results. However, each clustering scheme produced a 
cluster of Reserve Center markets related to success, such as, high accessions, high fill 
rates, and high competitive success. Also, the different schemes between markets do 


not vield a consistent or discernable pattern. High accession clusters are not the same 


Disappointing results from the regression models for fill rates does not mean that 
cluster analvsis is the wrong approach; indeed, it may mean there were no predicting 
variables in the data set or that fill rates are not predictable dependent variables, or 
that other notions of similarity should be explored. 

Accession counts were chosen for analysis first because of the abundance of 
literature and proven theories on the subject. The cluster results according to accession 
counts can assist Army Reserve recruiters in identifying markets where accessions 
would be expected to be high or low. Low accessions in medium market clusters may 
serve as criteria for future locations of Reserve units. Fill rates measure how successful 
the markets are in reaching their goals (authorizations). Cluster results may indicate 
market areas (low) where more recruiting resources should be increased or market 
areas (high) where recruiting resources can be relaxed. Competitive success clusters can 
assist policy makers on decisions to expand, where to locate new units, and allocation 
of recruiting resources. Moderate competitive success markets indicate that more 
recruits are enlisting in the National Guard rather than the Army Reserves. The 
success of National Guard recruiting may indicate fertile ground for expanding current 
units or adding additional units. 

Overall, cluster analysis applied to accession modeling is very encouraging. Policy 
makers can classify and identify Reserve Center markets according to accession counts, 
fill rates. and competitive success. The classifications could play an important role in 
the location of future Reserve units or in the expansion of current units. In view of the 
encouraging results that have been achieved using this data base, a next step would be 
to expand the base to provide new and different avenues for analysis. One avenue to 
pursue would be to include data in the accession data base which will be responsive to 
changes in local areas or variables which reflect the mulitary propensity of local 
markets. Willingness to serve can play an important role in the success or failure in 


obtaining future accessions. 
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IV. CONCLUSION 


This thesis has discussed the use of cluster analysis to group Reserve Centers 
based on the economic characteristics of Reserve Center local labor markets and 
characteristics of individual Reserve Centers. The analysis has presented three 
examples of cluster analysis in which Reserve Center market characteristics are treated 
as measurable objects. The results obtained in each demonstration are not presented as 
solid conclusions; they are implied incidentally while demonstrating applications of 
clustering methods to manpower problems. It is asserted that the methods used here 
are representative of a wide range of applications for cluster analysis in the area of 
manpower planning and Reserve Center classification. There 1s no “correct” way to 
cluster data and a variety of methods are available, each requiring a different set of 
assumptions and utilizing different aspects of the measurements as the basis for 
discrimination between groups. 

Although cluster analysis was developed for the physical sciences, 1t can have a 
wide range of applications 1n manpower analysis. Potential users should be aware of 
three concluding precautionary generalizations outlined by Aldenderfer: [Ref. 14: p. 
15]: 

(1) The strategy of cluster analysis is structure-seeking although its operation 1s 
structure-imposing. 
That is, clustering methods are used to discover structure in data that is not readily 
apparent by visual inspection or other methods. Although the strategy of clustering 
may be structure seeking, its operation is one that 1s structure imposing. A clustering 
method will always place objects into groups, and these groups may be radically 
different in composition when different clustering methods are used. The key to using 
cluster analysis is knowing when these groups are real and not merely imposed on the 
data by the method. A number of validation procedures have been developed to 
provide some relief for this problem. 
(2) Cluster analysis methods have evolved from many disciplines and are inbred with 
the biases of these disciplines. 
Each discipline has its own biases and preferences as to the kinds of questions asked of 
the data, the types of data thought to be useful in building a classification, and the 
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structure of classifications thought to be useful. Since clustering methods are often no 
more than plausible rules for creating groups, manpower users must be aware of the 
biases that often accompany the presentation and description of a clustering method. 

(3) Cluster analysis methods are relatively simple procedures that in many cases, are 
not supported by an extensive body of statistical reasoning. 

In other words, many cluster analysis methods are heuristics (simple rules of thumb). 
They are little more than plausible algorithms that can be used to create clusters of 
cases. This stands in sharp contrast to factor analysis, for instance, which is based 
upon an extensive body of statistical reasoning. Although many clustering algorithms 
have important mathematical properties that have been explored in some detail, it is 
important to recognize the fundamental simplicity of many of the methods. In doing 
so, the user is far less likely to make the mistake of reifying the cluster solution. 

The three applications of cluster analysis used in this thesis suggests that: 

(1) Reserve Center Markets according to accession counts can be classified and 
identified as: high accession market areas, medium accession market areas, and 
low accession market areas. 

(2) Reserve Center Markets according to fill rate can be classified as: high fill rate 
markets, medium fill rate markets, and low fill rate markets. 

(3) Reserve Center markets according to competitive success measures can be 
classified as: high competitive success markets, and moderate competitive 
success markets. 

Accession count clusters, fill rate clusters, and competitive success clusters can 
assist policy formulation in the areas of: location of new units, allocation of new 
authorizations, and allocation of recruiting resources. In addition, cluster analysis can 
serve as a source for hypotheses about accessions which can be tested using regression 


and other multivariate analysis. 


APPENDIX A 
VARIABLE DEFINITION 


NPS Male/Female... Cumulative counts of all NPS accessions in the appropriate 
categorv during FY 83-85 inclusive. NPS implies no prior service in any Active or 
Reserve component. 


OMA Male/Female... Qualified military available. This is a count of the male /female 
population in the market area aged 17-29 vears. 


% of Population Black... Total number of blacks divided by the total population in 
each market in 1980. (all ages and sexes). 


Average Family Income... Average income accruing to all families in the market area 
from all sources in 1980. 


Average Family Size... Average number of family members in 1980. 
Median Home Value... Median value of all family homes in the market area in 1980. 
Median Home Rent... Median rent paid for all dwellings in the market area in 1980. 


% of Families with Dual Workers... Number of families with two or more members 
holding full or part time jobs in 1980. 


% Population Change... Total population figures for each market area. 
((1980-1970)/1970)x100. 


Manufacturing Workers... Proportion of workers reported in census classifications 
‘manufacturing’, ‘transport’ and ‘communications’ in 1980 in each market area. 


Service Workers... Proportion of workers reported in census classifications ‘wholesale’, 
‘retail’, ‘finance’, ‘service’, ‘recreation’, ‘health’, ‘education’, and ‘other’ in 1980 in each 


market area. 


Government Workers... Proportion of workers reported in ‘government census 
classification in 1980 in each market area. 


Seasonal Workers... Proportion of workers reported in census classifications 
‘agriculture’ and ‘construction’ in 1980 in each market area. 
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APPENDIX B 
VARIABLE LISTING 
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APPENDIX C 
ACCESSION VARIABLES 
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APPENDIX D 
SAMPLE VALUE LISTINGS 
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APPENDIX E 
Z-SCORE TRANSFORMATION 
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APPENDIX F 
Z-SCORE VALUES 
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APPENDIX G 
ACCESSION CLUSTER STATISTICS 


Slscemuerevien NONPRIOR SERVICE MALE & FEMALE ALL 


MEAN 352.097 SD DEY 461.500 
MINIMUM ese MAXIMUM 1903 
VALID OBSERVATIONS - 146 MISSING - 0 


Cluster 2 ARMY NONPRIOR SERVICE MALE & FEMALE ALL 


MEAN 2 Cr SD DEY 82.579 
MINIMUM 6 MAAIMUM 361 
VALID OBSERVATIONS - 48 MISSING - 0 


eiistes gent yaNOWPRIOR SERVICE MALE & FEMALE ALL 


MEAN 122/016 SUSE 122.542 
MINIMU™M 0 MAXIMUM 730 
VALID OBSERVATIONS - 742 MISSING - 0 
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APPENDIX H 
ACCESSION COUNT ANALYSIS OF VARIANCE CLUSTER VARIABLES 


ANALYSIS OF VARIANCE. 


VARIABLE CLUSTER MS DF VERRORSIS DE F PROB 
ZONEMES 56.8467 2 8666 Boo. 65. 3S 000 
ZAVGFAMS 1olese2Z 2 aoOMS 9oen0 315.5144 .000 
ZMANUFAC Loot Z ~9742 93320 14.4859 000 
ZWAGES $5.39 90 é moLoL IES A 10 104.2620 -0C@ 
ZPOPCHNG 88.2305 Z ~8200 3350 107.6024 000 
2MILINS 27 35086 2 3914 S3320 104.0553 ~000 
ZPMILAVA 259% 260 2 4490 93520 970.08Zs 000 
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APPENDIX I 
ACCESSION DISTANCE COEFFICIENTS 


AGGLOMERATION SCHEDULE USING AVERAGE LINKAGE (BETWEEN GROUPS ) 


CLUSTERS COMBINED STAGE CLUSTER 1ST APPEARS NEXT 
STAGE CLUST 1 CLUST 2 COEFFICIENT CLUST 1 CLUST 2 STAGE 
1 10 i - 149903 0 0 
2 2 10 - 283479 0 al 6 
3 5 6 -493985 0 0 12 
“: 13 23 - 526509 0 0 8 
5 22 32 - 710301 0 0 14 
6 9 - 798603 0 2 28 
7 a 25 -861992 0 0 ie 
8 13 15 1.200522 % 0 24 
9 S 28 1.221478 0 0 LZ 
10 17, 24 1.302960 0 0 Zu 
id 30 40 1.310357 0 0 23 
ua 5 7 1.462651 2 i 17 
13 iz 18 126553 71 0 0 21 
14 ce eT 1.660768 5 0 18 
15 20 37 1.702411 0 0 23 
16 & 26 2.268536 0 0 20 
a7 5 2.280144 9 12 ce 
18 eed ST 2.648871 14 0 co 
EF 14 16 2.752981 0 0 29 
20 a & 3.005300 0 16 Ze 
21 re 17 3.011827 13 10 24 
ae Zz 3 3.151413 20 17 30 
25 20 30 3.341523 15 Lt ou 
24 12 Ls 3.391588 21 8 eT 
25 ZS 34 3.589299 0 z5 
26 38 Zo 4.019897 0 Se 
27 lz 21 5.439097 24 0 St 
28 36 5.448341 6 0 32 
29 14¢ ae 6.152546 i 18 30 
30 1 14 6.985703 22 29 235 
31 12 20 7.244295 27 2s 34 
32 8 38 8.211899 28 26 37 
33 eo 8.884380 30 0 36 
34 iia 19 9.122895 Si 0 36 
35 33 35 9.210579 25 0 39 
36 1 iz 11.836977 33 34 38 
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